Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: build and use python-build-standalone with official builds #25969

Merged
merged 19 commits into from
Feb 13, 2025

Conversation

jdstrand
Copy link
Contributor

@jdstrand jdstrand commented Feb 4, 2025

This PR enables the processing engine for:

  • Linux amd64/arm64 (tar.gz, deb and rpm)
  • Darwin arm64 (tar.gz, brew (future))
  • Windows amd64 (zip)
  • Docker (Linux amd64)

It necessarily removes support for MUSL builds (see README_processing_engine.md in this PR). Once we have Linux builds on an older glibc (#26010), this should largely not be an issue (again, see the README_processing_engine.md for details).

The PR temporarily adjusts .circleci/config.yaml to build all of the above artifacts before committing to main so testing can be done by fetching the build artifacts from the 'Artifacts' tab of the build-packages job. These temporary changes are clearly marked with 'TEMPORARY' in the .circleci/config.yaml file and will be removed before committing.

The rust code changes are intentionally of 'POC' quality as there are a few hacks to setup a few things in order to have a good user experience. Since I was sure that these would not remain, I didn't write tests for this code. Thankfully the code changes are minimal (thanks to @jacksonrnewhouse's fine work) and easy to understand, so they can be cleaned up in a subsequent PR. If you'd prefer me to make these changes now, please let me know how you'd like it and I'll take care of it.

I suggest reading README_processing_engine.md now to understand why things are the way they are, then come back here and resume your review.

Important things:

  • the Linux release builds are building on a rather new distribution (Debian Bookworm) with glibc 2.36 and therefore the Linux builds aren't very portable (Debian Bookworm and Ubuntu 24.04 are good to go; Ubuntu 22.04 is confirmed to work even though it is 2.35). I filed Update ci-support to link against more compatible GLIBC #26010, assigned myself and will update https://github.com/influxdata/ci-support to address this (see README_processing_engine.md for lots of details). Darwin and Windows are not affected and should have great portability
  • .circleci/scripts/package-validation/validate temporarily uses rpm -ivh --nodeps until we build on an older glibc - Remove rpm validation workaround once have (more) portable GLIBC #26011
  • we're doing something slightly hacky and using install_name_tool and patchelf to avoid a wrapper script. Ideally this would be done by the linker, but I couldn't figure out how and I think we're limited in how python-build-standalone is built any way. While it is hacky, it works and is what the tools are for, so there is no pressing need to fix this
  • the Windows zip has copies of a few DLLs alongside the influxdb3 binary, which isn't the prettiest (see README_processing_engine.md). It works though, so no pressing need to fix
  • influxdb3/src/main.rs has some hacks to help find things. This is clearly not the right place for this code and there may be ways to not need it at all (Clean up runtime code for finding/using standalone python and entering a venv #26012). Specifically:
    • set_pythonhome() sets PYTHONHOME based on where it finds the runtime. I kinda feel like this shouldn't be needed, but the concept of it isn't terrible (even if the location in the code and probably the code itself is)
    • set_pythonpath() sets PYTHONPATH and is required on Windows for some reason. This should be fixed properly
  • influxdb3_processing_engine/src/virtualenv.rs has a few changes (but not enough):
    • get_python_version() is adjusted to use python on Windows as python3 doesn't exist there
    • initialize_venv() is adjusted to use Scripts/activate and call cmd.exe on Windows
  • despite the above, --virtual-env-location doesn't work on any platform. Linux and OSX can be made to work with venv using source /path/to/venv/bin/activate and launching influxdb3 serve ... under it (client doesn't need this). Windows needs to set PYTHONPATH=\path\to\venv\Lib\site-packages. For this and the virtualenv.rs stuff, I filed Clean up runtime code for finding/using standalone python and entering a venv #26012 and assigned to @jacksonrnewhouse
  • the .circleci/config.yaml file is wasteful as we're always fetching the same python-build-runtimes. It would be better to save them off in circleci and only refetch if things change. Cache fetched python-build-standalone all.tar.gz tarball in circleci to avoid always fetching #26013
  • currently using python-build-runtime 3.11.11. This is partly because it is what I've been testing but it is also the version of python in Debian Bookworm, so it is a good test to make sure the linking and path lookups are functioning correctly. Upgrading is easy (see README_processing_engine.md) and can be done in a follow-up PR
  • only pip was tested, not uv because python-build-standalone has pip built in (use /path/to/python -m venv <name> to create the venv, then activate in the normal way and can use pip (note, Windows needs python -m pip ... instead; see README_processing_engine.md), but uv can be installed by pip if desired. See Consider integrating uv with standalone python builds #26016

Finally, while we may want to change the name of the system-py feature, I feel it still has a place for local builds, distribution packages, custom container builds, etc and should be optional by default. See README_processing_engine.md for details.

Testing performed:

  • Linux amd64/arm64 (tar.gz, deb) works in non-activated and activated venv
  • Darwin arm64 works in non-activated and activated venv
  • Windows works in non-activated venv. Requires set PYTHONPATH=\path\to\venv\Lib\site-packages to use the venv (expected, see above)
  • also see Add e2e tests for the processing engine #26017

@jdstrand jdstrand added the v3 label Feb 4, 2025
@jdstrand jdstrand force-pushed the jdstrand/pe-standalone-poc branch 2 times, most recently from 782597a to 3513c49 Compare February 4, 2025 22:36
@jdstrand jdstrand marked this pull request as draft February 4, 2025 22:37
@jdstrand jdstrand force-pushed the jdstrand/pe-standalone-poc branch 3 times, most recently from 4d064e2 to 917ef17 Compare February 4, 2025 22:42
@jdstrand jdstrand changed the title DRAFT: feat: build and use python-build-standalone with official builds feat: build and use python-build-standalone with official builds Feb 4, 2025
@jdstrand jdstrand force-pushed the jdstrand/pe-standalone-poc branch 22 times, most recently from ecc6cb5 to 19271e0 Compare February 10, 2025 18:19
@jdstrand
Copy link
Contributor Author

jdstrand commented Feb 13, 2025

This can be done, but there are ramifications for the development workflow. Perhaps we can chat after @jacksonrnewhouse has time to digest this? We can do this stuff in a follow-up PR of course.

Fyi, with this PR, local builds that use python-build-standalone can be done like so:

  1. download python-build-standalone and unpack it somewhere
  • get from https://github.com/astral-sh/python-build-standalone/releases
  • based on your host OS, choose one of aarch64-apple-darwin-install_only_stripped.tar.gz, aarch64-unknown-linux-gnu-install_only_stripped.tar.gz, x86_64-pc-windows-msvc-shared-install_only_stripped.tar.gz, x86_64-unknown-linux-gnu-install_only_stripped.tar.gz
  1. create pyo3_config_file.txt to match the unpacked dir and downloaded python version. Eg, if downloaded and unpacked a 3.11.x version to /tmp/python:
    $ cat ./pyo3_config_file.txt
    implementation=CPython
    version=3.11
    shared=true
    abi3=false
    lib_name=python3.11
    lib_dir=/tmp/python/lib
    executable=/tmp/python/bin/python3.11
    pointer_width=64
    build_flags=
    suppress_build_script_link_lines=false
    
  2. build with:
    # note: PYO3_CONFIG_FILE must be an absolute path
    $ PYO3_CONFIG_FILE=${PWD}/pyo3_config_file.txt cargo build --features "aws,gcp,azure,jemalloc_replacing_malloc,system-py"
    
  3. Linux/OSX: patch up the binary to find libpython:
    # linux
    $ patchelf --set-rpath '$ORIGIN/python/lib' ./target/<profile>/influxdb3
    # osx (be sure to match the libpython version with what you downloaded)
    $ install_name_tool -change '/install/lib/libpython3.11.dylib' '@executable_path/python/lib/libpython3.11.dylib' ./target/<profile>/influxdb3
    
  4. Linux/OSX: put the python runtime in the expected location (XXX: may be
    possible at run time to see where the libpython we are using is and adjust
    the code to base the location of the runtime on that). Eg, it unpacked to
    /tmp/python:
    $ test -e ./target/<profile>/python || ln -s /tmp/python ./target/<profile>/python
    
  5. run with:
    $ mkdir -p /path/to/plugin/dir
    # linux and osx (if can't find libpython or the runtime, check previous steps)
    $ ./target/<profile>/influxdb3 ... --plugin-dir /path/to/plugin/dir
    # windows requires moving the binary into the python-build-standalone unpack directory
    $ cp ./target/<profile>/influxdb3 \path\to\python-standalone\python
    # run influxdb with
    $ \path\to\python-standalone\python\influxdb3.exe ... --plugin-dir \path\to\plugin\dir
    

(I added this to README_processing_engine.md; it's of course possible to smooth this out with a helper script)

@jdstrand jdstrand force-pushed the jdstrand/pe-standalone-poc branch from 2b00534 to 3998d2e Compare February 13, 2025 19:26
Copy link
Contributor

@jacksonrnewhouse jacksonrnewhouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just have a few comments.

README_processing_engine.md Outdated Show resolved Hide resolved
README_processing_engine.md Outdated Show resolved Hide resolved
@@ -17,7 +17,14 @@ pub enum VenvError {
}

fn get_python_version() -> Result<(u8, u8), std::io::Error> {
let output = Command::new("python3")
// linux/osx have python3, but windows only has python. Use python since it is in all of them
let python_exe = if cfg!(target_os = "windows") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about factoring this out into a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that but considering more needs to happen here (#26012, currently assigned to you), I wonder if that can be in a follow-up PR?

@jdstrand jdstrand force-pushed the jdstrand/pe-standalone-poc branch from 626d947 to eb56f3e Compare February 13, 2025 22:01
@jdstrand jdstrand force-pushed the jdstrand/pe-standalone-poc branch from eb56f3e to 21a6b6c Compare February 13, 2025 22:04
@jdstrand
Copy link
Contributor Author

Fyi, I incorporated the two doc fixes from @jacksonrnewhouse, dropped the TEMPORARY bits from .circleci/config.yaml by dropping the commit and force pushing.

@jdstrand
Copy link
Contributor Author

What about factoring this out into a constant?

I'm fine with that but considering more needs to happen here (#26012, currently assigned to you), I wonder if that can be in a follow-up PR?

In the interest of time, I'm going to merge this without this code change and then keep an eye on main builds. If this is something you want me to address separately from the changes needed to address #26012, I can do a follow-up PR.

@jdstrand jdstrand merged commit ccd5d22 into main Feb 13, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants